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Supervised Learning Regression Problem 

Given the “right answer” for Predict real-valued output 
each example in the data. 
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Size in feet 2 (x) 


Training set of 2104 

460 

housing prices 1416 

232 

1534 

3i5 

852 

178 

Notation: 

• • • 

m = Number of training examples 


x’s = “input” variable / features 


y’s = “output” variable / “target” variable 


Price ($) in moo's (y) 


(x, y) is one training example 
(x® ,y® ) is the i th training example 
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Estimatec 
price 

(Estimated 
value of y 

h is a function that maps from x’s to y’s 
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How do we represent h ? 

Hypothesis: 

ho(x) = Oq + 0\x 


Linear regression with one variable. 
Univariate linear regression. 



Linear regression 
with one variable 

Cost function 

Machine Learning 
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Training Set 



Hypothesis: 



which we use to make predictions 

he(x) = Oo + 0 \x 

Parameters 


How to choose Oi s ? 


This will let us figure out how to fit the best possible 


straight line to our data 


Dr Sherin ElGokhy 


3 


0 


x) — Oq + 9\x 


3 

2 


1 + 

O 

o 


H- 

i 

00 

01 


H— 

2 

1.5 

0 



o 


i 

e i 


Dr Sherin ElGokhy 



-l 1 

2 3 

0 

0.5 



Oo = 1 
= 0.5 


8 



y 


X 


X 

X 


X 

X 

X 


X 



Cost Function: 

m 9 

J(0 oA) = 2 k E {he{x (i) ) -y (i) ) 

1=1 

^ Goal: minimize J(0q,9 1 ) 

0o, Oi 


Idea: Choose #1 so that 
ho(x) is close to y for 
our training examples (x, y ) 



Minimization problem 
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Linear regression 
with one variable 

Cost function 
intuition I 


Machine Learning 
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Hypothesis: 

ho(x) — Oo + 9\x 

Parameters: 

0o, #1 

Cost Function: 

m 

A) = ^ £ (M* (i) ) - 

i = 1 

Goal: minimize J(0n,0i) 

00,^1 
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Simplified 


ho{x) — 6\x 
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m 


J(0i) = 2^ £ (M* (i) ) - y w ) 

1=1 


minimize J{9\) 

0 1 
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(for fixed 0 1 , this is a function of x) 
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(function of the parameter 9 \ ) 



0 1 
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(for fixed 0 1 , this is a function of x) 
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(function of the parameter 0 1 ) 
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(for fixed 0 1 , this is a function of x) 
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(function of the parameter 9 \ ) 



0i 
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Linear regression 
with one variable 

Cost function 
intuition II 
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Hypothesis: h&(x) — 0 o + B\x 


Parameters: 
Cost Function: 

Goal: 


0o, 01 


. 7 ( 00 , 0 !) 


1 

2m 


rn 


( ho{x (i) ) 



minimize 


J{» oA) 
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(for fixed#o , 0 \ , this is a function of x) 


Price ($) 
in 1000s 
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(function of the parameters 0 o i 0 1 ) 
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./(0o,0i ) 
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75 
50 


0 X -20 
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(for fixed#o , 0 \ , this is a function of x) 



-1000 
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(function of the parameter^ , 0 \ ) 
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0o 
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(for fixed#o , 6 \ , this is a function of x) 



1000 


2000 


3000 


4000 


<£> 



Size (feer) 


- 0.5 
-1000 
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(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 



-1000 
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J (Jh xJh) 

(function of the parameter^ , 0 \ ) 
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Linear regression 
with one variable 

Gradient 

descent 


2 3 



Linear regression with one variable. 



Oq + 0\X 
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Hypothesis: h&(x) — 0 o + B\x 


Parameters: 
Cost Function: 

Goal: 


0o, 01 


. 7 ( 00 , 0 !) 


1 

2m 


rn 


( ho{x (i) ) 



minimize 


J{» oA) 
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The gradient decent algorithm is used to 
minimize the objective function 

min J(0 O , 0i ) 

Outline: 

• Start with some initials g 0 ^ 

• Keep changing 0 0 , 0i to reduce 

until we hopefully end up at a minimum 
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rn 


means until end up at a local minimum 


repeat until convergence { 

0< :=e< -aS-J{0 o ,e i) 


} 


89 , 


(for j = 0 and j = 1) 


Learning rate that controls how big step we take down with gradient decent 


Correct: Simultaneously update 

tempO := 0 o - a-^J(9 0 , Oi) 

tempi := Q 1 - a-J^ J(0 O , Oi) 

0 o := tempO 
6 1 := tempi 


Incorrect: 

tempO := 6 0 - a-^J(6 0 , #i) 
0 o := tempO 

tempi := Oi - Oi) 

0\ := tempi 
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Intuition of derivative term 
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Oi := Q\ — ot-^-J(6\) 

uu\ 

If a is too small, gradient descent 
can be slow. 


If a is too large, gradient descent 
can overshoot the minimum. It 
may fail to converge, or even 
diverge. 
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Current value of#i 
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6 \ at local optima 


Oi :=«i-4fi) 

CLui 
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radient descent can converge to a local minimum, 
even with the learning rate a fixed. 

/K 

01 := 01 -aJ-J(0,) 

As we approach a local J(0i) 
minimum, gradient 
descent will automatically 
take smaller steps. So, no 
need to decrease a over ”1 
time. 
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Gradient descent algorithm 


repeat until convergence { 
9j := & j ~ a QQ^J(9o,6i) 

(for j = 1 and j = 0) 

} 
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Linear Regression Model 


hg(x) = 9o + 9\X 

m 9 

2 = 1 

The gradient decent algorithm is used to 
minimize the objective function 
min J (Oo , 0 \ ) 

6>o,6>i 
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Gradient descent algorithm 


repeat until convergence { 


#o Oq — a 


m 


m 


Y (he(x^) - y w ) 


e 1 := 61 - Y ~ y w ) 

i — 1 

} 
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update 
Oq and 0\ 
simultaneously 
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J(6 M) 


Cost function is always a convex function 
Function with one global optima 
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(for fixed#o , 0 \ , this is a function of x) 
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J (Jh xJh) 

(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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(function of the parameter^ , 0\ ) 


(for fixed^o , Q\ , this is a function of x) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 



0.5 pr- 
0.4 
0.3- 
0 . 2 - 
0.1 
0 " 

- 0.1 \ 

- 0.2 

-0.3- 

-0.4 

-0.5 1 — 
-1000 


Dr Sherin ElGokhy 



(function of the parameter^ , 0 \ ) 
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(for fixed#o , 0 \ , this is a function of x) 
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(function of the parameter^ , 0 \ ) 
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“Batch” Gradient Descent 


“Batch”: Each step of gradient descent 
uses all the training examples. 
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Report 

Due date: Next Tuesday 

Write a matlab function for the gradient 

decent algorithm. 

Good luck 
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